README FOR "Variable Selection with Scalable Bootstrapping in Generalized Linear Model for Massive Data"
Manuscript ID: JDS2202-010

Contact Info: Zhang Zhang 
Please send any questions to zzruc@ruc.edu.cn 

Code to reproduce simulation results are located in this folder.


#######################################################################################
# Simulations are stored in the /code and data/ directory. From here on out it is assumed that we will be working in this directory.

- 01 lasso_lm_beta=1.R
This code will generate folder "01 results_lasso_lm_p=35_beta=1" and estimated coefficients beta and time for BootVS and BLBVS with corresponding gamma value under the setting in Example 1 will be write as .csv files into this folder.

plotConverge.R and plotCutoff.R will read these .csv files to plot. 

- 02 lasso_logistic_beta=1.R
Similar as 01 lasso_lm_beta=1.R except the setting is Example 2.

- 03 grplasso_lm_beta=1.R
Similar as 01 lasso_lm_beta=1.R except the setting is Example 3.

- 04 grplasso_logistic_beta=1.R
Similar as 01 lasso_lm_beta=1.R except the setting is Example 4.

- 05 Heatmap_lasso_lm_beta=1.R
This code will generate folder "05 Heatmap_lasso_lm_beta=1".

- 06 heatmap_grplasso_logistic_beta=1.R
This code will generate folder "06 Heatmap_grp_logistic_beta=1".

- 07 lasso_lm_beta=1_parallel.R
Similar as 01 lasso_lm_beta=1.R except the setting is Section 4.


- functions_refit_bic.R
This file contains all the helper functions for simulations and real data application.

- hmplot.R
This file will read .csv files in folder "05 Heatmap_lasso_lm_beta=1" and "06 Heatmap_grp_logistic_beta=1" to plot heatmap graph.

#######################################################################################
REAL DATA APPLICATION

- 08 bikeshare.R
This code will generate folder "08 results_bike" and estimated coefficients beta and time for BootVS and BLBVS with corresponding gamma value will be write as .csv files into this folder.

- bikehour.csv: This is the bike sharing dataset.

##
- 09 lendingClub.R
This code will generate folder "09 results_loan" and estimated coefficients beta and time for BootVS and BLBVS with corresponding gamma value will be write as .csv files into this folder.


- lc16_19_sample2000.csv: Please note the original lending club dataset is too huge to share. So we randomly select 2000 observations to share here. The original lending club dataset is publicly available.


